Semi-supervised Bootstrapping of Relation Triples from the Web, Query Languages over these Noisy Triples, their Semantics, and Query Execution Systems

نویسنده

  • Amit Kumar Singh
چکیده

Information Extraction (IE) is the process of retrieving structured information from unstructured text. IE has traditionally relied on extended human interposition to extract small set of predefined relations from the corpus. Now with Web coming in to picture, methods and goals of IE have taken a slight detour, with increasing focus on following challenges 1. Domain independent/Open Information Extraction (minimizeobviate human intervention) 2. Leverage redundant (implicitly structured) information on the ever growing web. 3. Capable of dealing with large level of ambiguity/noise in Natural Language. 4. Scalability of the IE systems to web-size dynamic corpora 5. Ability to Extract/Integrate Relations of arbitrary-arity 6. Ability to efficiently retrieve answers to arbitrary structured queries on the extracted data/schema In the remaining portion of this document, I intend to highlight some of the significant developments 1 made in this area starting from domain specific, hand-tagged IE systems to state of the art selfsupervised Open IE framework and beyond. Instead of describing the algorithms used in the IE Systems, I have tried to enlist some significant characteristics inferred while studying these systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

A Survey on Models and Query Languages for Temporally Annotated RDF

In this paper, we provide a survey on the models and query languages for temporally annotated RDF. In most of the works, a temporally annotated RDF ontology is essentially a set of RDF triples associated with temporal constraints, where, in the simplest case, a temporal constraint is a validity temporal interval. However, a temporally annotated RDF ontology may also be a set of triples connecti...

متن کامل

RDF triples management in roStore RDF triples management in roStore

Résumé : This paper tackles issues encountered in storing and querying services dealing with information described with Semantic Web languages, e.g. OWL and RDF(S). Our work considers RDF triples stored in relational databases. We assume that depending on the applications and queries asked to RDF triple stores, different partitioning approaches can be considered : either storing all triples in ...

متن کامل

A relational algebra for SPARQL

The SPARQL query language for RDF provides Semantic Web developers with a powerful tool to extract information from large datasets. This report describes a transformation from SPARQL into the relational algebra, an abstract intermediate language for the expression and analysis of queries. This makes existing work on query planning and optimization available to SPARQL implementors. A further tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007